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CHAPTER  I 


BACKGROUND  AND  STATEMENT  OF  PROBLEM 
I n t r oduc  t ion 

Experimental  studies  have  demonstrated  that  word  lists 
retain  intelligibility  when  subjected  to  distortion  by  infi- 
nite peak  clipping  (Licklider  and  Pollack,  1948;  Licklider, 
Bindra,  and  Pollack,  1948).  These  findings  have  been  inter- 
preted to  mean  that  the  temporal  pattern  of  zero-axis  crossings 
of  the  speech  waveform  provides  information  that  is  in  itself 
sufficient  for  perception  of  speech.  A corollary  Inference 
is  that  clues  provided  by  relative  amplitude  variations  in 
the  speech  waveform  may  be  discarded  without  degradation  of 
intelligibility.  These  interpretations  of  the  effects  of  in- 
finite peak  clipping  are  of  considerable  theoretical  impor- 
tance: to  the  extent  that  they  are  true  they  describe  an 

invariant  factor  in  speech  intelligibility.  However,  there 
are  two  reasons  to  suspect  that  these  generalizations  may 
need  qualification.  One  reason  is  that  the  speech  sample 
used  in  the  early  studies  may  have  provided  contextual  clues 
to  word  perception  in  addition  to  the  frequency  and  amplitude 
clues  inherent  in  the  speech  wave.  Thus,  with  multiple  clues 
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operating,  the  role  of  one  physical  parameter  is  more  diffi- 
cult to  assess.  A second  reason  to  question  the  generaliza- 
tions about  zero-axis  crossing  and  relative  amplitude  clues 
is  that  these  clues  may  assume  different  importance  for  differ- 
ent types  of  speech.  Factors  that  affect  intelligibility  of 
word  lists  may  differ  from  those  that  affect  individual  sounds; 
moreover,  it  is  possible  that  these  factors  may  assume  differ- 
ent importance  among  different  individual  speech  sounds. 

It  may  be  argued  that  speech  communication  generally 
involves  units  larger  than  individual  speech  sounds,  and  that 
word  intelligibility  provides  a closer  approximation  to  in- 
telligibility of  conversational  speech.  It  may  be  true  that 
whole  words,  with  contextual  clues  free  to  operate,  have  higher 
face  validity  as  representative  of  speech  in  general.  However, 
speech  communication  situations  do  exist--in  military  communi- 
cations, for  instance--in  which  identification  of  specific 
speech  sounds  may  be  essential.  Moreover,  fundamental  gen- 
eralizations about  factors  that  determine  intelligibility  of 
speech  should  be  capable  of  verification  on  molecular  levels 
(figuratively)  such  as  individual  phonemes,  consonant  clusters, 
and  syllables,  as  well  as  on  the  molar  level  of  words  and 
continuous  discourse. 

The  focus  of  this  study  is  on  the  individual  speech 
sound.  Specifically,  the  present  study  represents  an  attempt 
to  determine  the  effects  of  infinite  peak  clipping  on 
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intelligibility  of  specific  consonants.  It  also  represents 
an  attempt  to  minimize  the  opportunity  for  contextual  clues 
to  operate. 


Infinite  Peak  Clipping 

Def ini tion 

Peak  clipping  is  a form  of  distortion  in  which  ampli- 
tude is  prevented  from  exceeding  a certain  maximum  limit. 
Amplitude  is  held  constant  during  times  when,  in  a linear 
system,  it  would  exceed  the  limiting  level.  When  the  posi- 
tive and  n eg a t i V e- go i n g parts  of  the  speech  wave  are  equally 
limited,  the  effect  is  said  to  be  ^^symmetrical  peak  clipping." 
If  a wave  thus  limited  is  reamplified  so  that  the  clipped  out- 
put wave  equals  the  amplitude  of  the  peaks  of  the  original 
wave  (before  clipping),  and  if  this  operation  is  continued 
through  several  successive  stages  of  clipping  and  reamplifi- 
cation until  the  waveform  is  approximately  rectangular  in 
sliape  , the  operation  is  called  "infinite  peak  clipping."  Line 
B of  Figure  1 illustrates  the  result  of  infinite  peak  clip- 
ping on  the  waveform  shown- on  Line  A.  Two  characteristics  of 
the  clipped  wave  may  be  noted  on  this  figure:  the  points  (in 

time)  that  correspond  to  axis  crossings  in  the  undistorted 
wave  are  the  same  in  the  clipped  wave,  but  the  amplitudes 
have  only  two  values,  a single  maximum  excursion  above  and 
below  the  axis.  In  addition,  all  indications  of  differences 
in  rise-time  or  fall-time,  or  of  changes  that  did  not  cross 
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Figure  1.  A comparison  of  a complex  wave  before  and  after 
infinite  peak  clipping  (after  Licklider,  Bindra,  and  Pollack, 
1948 ) . 


A.  Undistorted 

B.  Same  wave  following  infinite  peak  clipping 
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the  axis,  are  eliminated  in  the  clipped  wave. 

Relevant  literature 

Effect  s — cm — intelligibility . The  most  Important  study 
on  the  intelligibility  of  speech  distorted  by  infinite  clip- 
ping was  that  of  Licklider  and  Pollack  (1948).  Phonetically 
balanced  word  lists  recorded  on  discs  by  one  talker  were  sub- 
jected to  infinite  peak  clipping  and  presented  to  five  lis- 
teners in  a series  of  twenty-five  sessions.  Scores  averaged 
approximately  sixty-eight  percent  in  the  first  listening 
session,  but  improved  in  succeeding  sessions.  Intelligibility 
scores  of  over  ninety  percent  were  achieved  by  the  twenty- 
fifth  session.  The  authors  felt  that  this  increase  was  in- 
dependent of  the  listeners'  increased  familiarity  with  PB 
word  lists  but  instead  represented  learning  to  understand 
clipped  speech.  This  suggests  that  statements  regarding  the 
intelligibility  of  clipped  speech  should  specify  the  observers' 
familiarity  with  clipped  speech. 

The  effects  of  peak  clipping  in  combination  with  other 
forms  of  distortion  were  also  studied.  Integration  of  the 
waveform,  the  equivalent  of  a low-pass  filter  slope  of  six 
dB  per  octave,  severely  degrades  intelligibility  of  PB-50 
word  lists,  if  the  integration  occurs  prior  to  clipping. 
Differentiation  of  the  waveform,  the  equivalent  of  a high- 
pass  filter  slope  of  six  dB  per  octave,  performed  before 
clipping,  results  in  intelligibility  slightly  better  than 


6 


with  clipping  alone.  Integration  or  differentiation  of  the 
waveform  following  the  clipping  had  no  effect  on  intelligi- 
bility (Licklider  and  Pollack,  1948)  . When  combined  with 
distortion  by  reduction  of  sound  duration,  subsequent  clip- 
had  little  effect  on  the  intelligibility  of  vowels,  but 
did  reduce  the  intelligibility  of  consonants  (Ahmend  and 
Fatehchand,  1959)  in  CVC  and  VC  syllables. 

Spectral  effects.  In  attempting  to  predict  or  to  ex- 
plain the  effects  of  infinite  peak  clipping  under  various  con- 
ditions, it  would  be  helpful  to  know  the  effects  of  clipping 
on  the  spectrum  of  speech.  Unfortunately,  the  information 
available  is  scarce  and  somewhat  equivocal.  Only  one  report 
gives  empirical  measurements:  Licklider,  Bindra,  and  Pollack 

(1948),  reporting  the  results  of  an  intensi ty- f requency-time 
spectral  analysis  of  the  word  "shoe-bench,"  both  undistorted 
and  peak  clipped,  state  ".  . . although  many  of  the  details  of 

the  pattern  are  changed  by  infinite  peak  clipping,  the  general 
plan  of  the  terrain  is  by  no  means  rendered  unrecognizable. 

The  main  concentrations  o f ' lo w- f r equency  and  of  high-frequency 
energy  are  still  in  the  same  places  despite  the  rearrangement 
of  minor  peaks."  This  report,  even  while  minimizing  the 
effects  of  clipping  on  the  spectrum,  does  support  the  con- 
tention that  changes  do  occur  in  the  spectrum  under  infinite 
peak  c 1 i ppi ng . 

Two  theoretical  papers,  analyzing  the  clipping  process 
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mathematically,  exhibit  similar  ambiguity.  Vellchkin  (1962) 
concludes  . . It  is  evident  that  amplitude  clipping  causes 

a broadening  of  the  speech  spectrum,  but  this  broadening  is 
slight,  even  with  maximum  clipping."  Mathematical  calcula- 
tions by  Dukes  (1954)  confirmed  the  likelihood  that  clipping 
would  not  seriously  reduce  the  intelligibility  of  speech,  but 
he  adds,  ".  . . it  is  clear  that  the  results  have  significance 

in  respect  of  very  long  samples  . . . With  this  formulation 

nothing  can  be  deduced  about  the  intelligibility  of  individual 
sounds,  except  of  course  that  large  deviations  below  the 
average  must  ’be  relatively  infrequent.”  He  points  out  that 
in  applying  his  formulas  some  of  the  values  obtained  represent 
averages  over  all  possible  unvoiced  and  voiced  sounds. 

The  most  serious  shortcoming  of  the  spectral  inform- 
mation  available,  whether  empirical  or  theoretical,  is  that 
it  is  based  on  relatively  long-term  averaged  values.  Apparent- 
ly spectral  changes,  however  small,  do  occur  under  infinite 
peak  clipping,  but  are  obscured  by  averaging  over  time.  Spe- 
cific moment-to-moment  changes  are  not  yet  known  in  detail; 
this  makes  it  difficult  to  predict  the  effects  on  brief 
speech  sounds  such  as  consonants. 

Implications  of  previous  studies  on  peak  clipping 

Two  related  implications  of  the  Licklider  and  Pollack 
(1948)  study  have  been  considered  of  major  importance.  These 
concern  the  relative  importance  of  dynamic  amplitude  variation 
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and  of  the  temporal  pattern  of  zero-axis  crossings  in  the 
speech  waveform. 

Am  pi i tud  e clues . Since  the  speech  wave  subjected  to 

infinite  peak  clipping  assumes  a dichotomous  value  of  ampli- 
tude, the  almost  infinite  range  of  possible  amplitude  values 
present  in  the  undistorted  wave  is  reduced  to  one  value  of 
maximum  and  minimum  excursion.  This,  in  effect,  means  that 
information  regarding  relative  amplitudes  in  the  speech  wave 
is  discarded.  The  implication  drawn  is  that  ".  . . the  vari- 

ations in  intensity  from  moment  to  moment  appear  not  to  be 
basic  cues  for  the  recognition  of  words,"  and  that  "the  so- 
called  dynamic  characteristics  of  speech  are  not  of  vital  im- 
portance for  intelligibility.  It  is  apparently  just  as  well 
to  reproduce  all  the  fundamental  speech  sounds  (or  what  is 
left  of  them  after'  clipping)  at  the  same  intensity  level  as  it 
is  to  preserve  their  normal  intensities"  (Licklider  and  Pollack, 
1948) . 

Zero-axi  s crossings . It  has  been  reasoned  that  since 

infinite  peak  clipping  eliminates  information  regarding  rela- 
tive amplitudes,  as  well  as  on-  and  off-slope  and  wave  shape, 
all  that  remains  in  the  clipped  wave  is  the  temporal  pattern 
of  zero-axis  crossings.  The  inference  drawn  is  that  the  axis 
crossing  information  is  sufficient  to  provide  speech  intelligi- 

(Licklider,  1950).  It  should  be  noted  that  this  is 
negative  evidence  based  upon  the  elimination  of  other  possibilities. 
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Black  and  Hixson  (1959)  were  unable  to  demonstrate  a direct 
relationship  between  density  of  zero-axis  crossings  and  in- 
telligibility. 

The  implications  of  the  Licklider  and  Pollack  study 
may  be  paraphrased  as  follows: 

1)  The  information  provided  by  the  temporal  pattern 
of  zero-axis  crossings  is  necessary  and  sufficient  for  in- 
telligibility of  speech,  and 

2)  The  information  provided  by  the  dynamic  pattern  of 
amplitude  variations  is  not  necessary  for  intelligibility  of 
speech. 

The  point  of  view  that  has  stimulated  the  present  study 
is  that : 

1)  While  the  information  provided  by  the  temporal 
pattern  of  zero-axis  crossings  is  probably  necessary,  it  is 
possible  that  there  are  specific  specimens  of  speech  for 
which  it  may  not  be  sufficient,  and 

2)  While  the  information  provided  by  the  dynamic 
pattern  of  amplitude  variations  is  probably  not  necessary  for 
intelligibility  of  many  speech  samples,  it  is  possible  that 
there  are  some  specific  speech  sounds  for  which  this  informa- 
tion may  be  necessary. 

Applications  of  peak  clipping  in  communication 

Thus  far  infinite  peak  clipping  has  been  treated  in 


terms  of  distortion. 


It  may  also  be  viewed  as  a means  of 


simplifying  the  speech  wave  for  efficient  transmission 


Speech  encoding.  Licklider  and  Pollack  (1948)  observe 
that  infinite  peak  clipping  can  reduce  speech  to  a bivariate 
code  more  efficiently  than  pulse-modulation  procedures.  Not 
only  is  the  speech  wave  easily  encoded  by  infinite  peak  clip- 
ping, but  may  be  decoded  by  the  human  ear  with  no  further  de- 
coding apparatus  necessary other  than  the  transducer  that 
would  ordinarily  be  used  in  the  transmission  system.  South- 
worth  (1963)  describes  several  speech  digitizing  techniques 
based  upon  infinite  clipping;  these  include  pulse-number 
modulation  and  delta  modulation  techniques. 

broadcast  transmission.  One  of  the  earliest  appli- 
cations of  peak  clipping  was  in  broadcast  transmission.  Pre- 
modulation clipping  has  been  found  to  increase  the  efficiency 
of  power  use  in  braodcast  transmission  (Kryter,  Licklider, 
and  Stevens,  1947). 

Protective limiting.  Clipping  has  been  found  to  be 

useful  in  protecting  the  ear  from  high-energy  speech  peaks 
(Pollack  and  Pickett,  1959).  This  was  investigated  in  detail 
for  applications  in  hearing  aid  design  (Davis,  et  al . . 1947). 

In  all  pf  the  above  applications,  the  ultimate  use- 
fulness will  depend  to  some  extent  upon  how  intelligible  peak 
clipped  speech  remains  in  relation  to  the  needs  of  the  spe- 

s i ^ i on  . It  is  already  known  that  word  intelligibility 
is  satisfactory  under  peak  clipping.  To  the  extent  that  specific 


11 


communication  situations  demand  fine  discriminations,  the 
intelligibility  of  peak  clipped  speech  must  be  determined  in 
greater  detail . 

I^imitatlons  of previous  studies  on  peak  clinning 

j_nf  luence — o_f — test mat  eri  al  s . Experimental  results  on 

intelligibility  are  highly  dependent  on  the  type  of  test 
materials  used.  Under  identical  acoustic  conditions,  an  in-, 
t ell i gi bi 1 i ty  score  may  vary  by  as  much  as  ninety  percent 
depending  upon  whether  the  test  materials  are  digits  or 
nonsense  syllables  (Miller,  Heise,  and  Lichten,  1951).  Dif- 
ferences in  intelligibility  have  been  associated  with  mean- 
ing, number  of  syllables,  and,  to  a smaller  extent,  syllabic 
stress  (Hirsh,  Reynolds,  and  Joseph,  1954).  Since  the  cur- 
rently available  information  about  intelligibility  of  speech 
subjected  to  infinite  peak  clipping  is  based  on  monosyllabic 
words,  the  intelligibility  of  other  speech  samples,  such  as 
individual  phonemes,  remains  unknown  until  tested  empirically. 

^ ^ ^ — oJ — contextual  clues.  An  important  factor  is  the 
information  provided  by  the  context  of  sounds  in  a word  or 
syllable;  this  may  affect  the  number  of  alternative  responses 
available  if  a'particular  sound  is  unintelligible  (Miller,  1951). 
If,  for  instance,  a test  word  is  /stra3_/,  with  the  final  con- 
sonant unintelligible  to  the  observer,  and  the  test  consists 
of  nonsense  syllables,  there  are  more  than  twenty  possible 
final  consonants  available  as  responses.  These  include  the 


nonsense  words:  ^.ram,  strat  , ^trab.  strad  . stran  . strak. 

^tran^,  strag , stral , stray,  straf . straz.  strath  (voiced 


and  voiceless) 


s t r as , strash , strach . If,  however,  the  test 


vocabulary  consisted  of  real  English  words,  then  there  is 
probably  but  one  possible  response:  strap.  Thus,  if  the 

final  sound  were  rendered  unintelligible  by  some  distortion, 
on  the  real  word  test  the  observer's  knowledge  of  the  sta- 
tistical probabilities  of  occurrence  of  certain  sounds,  com- 
binations of  sounds,  and  orders  of  sounds  would  probably  en- 
able him  to  correctly  identify  the  word.  If  a distortion, 
such  as  infinite  peak  clipping,  produced  a systematic  effect 
on  the  intelligibility  of  certain  sounds,  this  could  be  ob- 
scured in  PB-50  scores  if  word  contexts  led  to  correct  iden- 
tification of  words  in  which  the  sounds  occurred. 

As  J.  D.  Harris  (1960)  has  observed: 


In  order  to  uncover  the  contribution  of 
each  type  of  physical  cue  alone,  it  is 
not  enough  to  eliminate  it  by  some  leger- 
demaine  in  the  laboratory.  When  this  is 
done  . . . intelligibility  is  often  un- 

affected. A false  conclusion  could  in 
that  case  be  reached  . . . that  the  cue 

eliminated  is  of  minor  importance  in 
speech  communication.  What  is  necessary 
is  to  eliminate  progressively  one,  two, 
and  more  cues  simultaneously. 


of  distorted  speech,  it  is  considered  important  in  this  study 
to  control  contextual  clues  while  studying  the  effects  of 
distortion  on  intelligibility  of  individual  sounds. 


Since  contextual  clues  may  contribute  to  the 
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Statement  of  the  Problem 

This  study  addresses  itself  to  three  questions: 

1)  Are  individual  speech  sounds  affected  in  intel- 
libibility  by  infinite  peak  clipping? 

2)  If  so,  are  these  effects  equal  from  one  phoneme 
to  another? 

3)  If  individual  speech  sounds  are  not  equally  affected 
in  intelligibility  by  infinite  peak  clipping,  is  there  a sys- 
tematic pattern  in  the  way  in  which  sounds  are  affected? 


CHAPTER  II 


PROCEDURE 


Stimulus  materials 

Selection  of  stimuli.  For  this  study  twelve  conso- 
nants /pbtdkgGdfvsz/  were  chosen  which  represent 
a variety  of  positions  and  manners  of  articulation,  as  well 
as  acoustic  effects.  In  particular,  the  selected  consonants 
include  examples  of  fricatives  and  plosives,  of  voiced- 
voiceless  cognate  pairs,  and  of  1 i ngua— al veol ar , lingua- 
dental,  bilabial,  and  velar  placements.  They  also  represent 
the  following  di.stinctive  features:  contrasts  of  strident- 

mellow,  tense-lax,  continuous-di scontinuant , grave-acute,  and 
compact-diffuse . 

Obviously  some  of  these  consonants,  namely  the  plosives, 
cannot  be  spoken  without  an  adjacent  vowel.  Moreover,  it  is 
well  known  that  the  acoustic  characteristics  of  consonants 
can  be  altered  by  the  vowel  context  in  which  they  are  pro- 
duced. Consequently,  in  order  to  counterbalance  the  vowel 
environment,  the  selected  consonants  were  embedded  in  a series 

of  VCV  syllables.  Specifically,  a corpus  of  ninety-six  non- 
sense Vo wel -C on s on an t - Vo wel  words  was  designed,  drawing  upon 
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the  twelve  consonants  in  combination  with  four  stressed 
vowels  and  the  neutral  schwa  sound.  The  counterbalancing 
was  accomplished  by  selecting  stressed  vowels  /i  ae  a u/ , 
which  represent  extremes  of  position  on  a vowel  quadrilat- 
eral, and  combining  them  with  each  of  the  consonants  in  both 
the  pre-vocalic  and  post-vocalic  positions.  By  adding  the 
schwa  to  each  of  these  CV  and  VC  combinations  in  the  initial 
or  final  position,  respectively,  VCV  syllables  were  formed 
which  correspond  to  Trochaic  or  lambic  stress  patterns. 

These  may  be  schematized  as  either  stressed  vowel  + consonant 
+ schwa,  or  schwa  + consonant  + stressed  vowel.  Thus  for  any 
replication  of  the  entire  corpus  there  are  eight  utterances 
of  each  consonant  embedded  in  different  vocalic  environments. 
The  entire  set  of  stimulus  words  is  presented  in  Table  1. 

Because  there  are  individual  differences  in  the  pro- 
duction of  speech  sounds,  it  was  deemed  desirable  to  use 
three  different  talkers  to  produce  samples  of  the  syllables. 
The  three  adult  male  talkers  were  chosen  from  the  Communica- 
tion Sciences  faculty.  Each  talker  had  extensive  training 
and  experience  in  phonetic  transcription. 

In  preparation  for  recording  spoken  examples  of  the 
selected  stimuli,  a phonetic  transcription  was  made  of  the 
ninety-six  nonsense  syllables.  The  transcriptions  were 
replicated  three  times,  once  for  each  talker,  and  arranged 
in  one  complete  random  order.  On  the  final  copies  of  this 
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Table  1 . 


Vocabulary  of  nonsense  syllables  used  as  stimuli. 


ISO 

39S0 

0 s i 

0 sas 

10  0 

ee9  0 

0 0 i 

0 0ae 

i f 0 

aef  0 

0 f i 

0 fae 

i Z0 

aez0 

0 zi 

azae 

i d 0 

aeda 

0 d i 

0 dae 

i V0 

aeva 

0 vi 

0 vae 

i p0 

aepa 

0 pi 

0 pae 

i 1 0 

aeta 

0 t i 

0 tae 

i k0 

aeka 

0 ki 

akas 

i b0 

asba 

0 bi 

0 bee 

i d 0 

aed  a 

0di 

adae 

i g8 ' 

Sega 

0gi 

agae 

a S0 

u s a 

a sa 

a su 

a 0 a 

U0  0 

0 0a 

a0u 

a f a 

Ufa 

a f a 

a f u 

a V a 

u za 

a va 

0 zu 

a d a 

u d a 

0 d a 

8 d u 

a V a 

u va 

a va 

a vu 

a pa 

u pa 

a pa 

a pu 

a t a 

uta 

a ta 

a tu 

aka 

uka 

a ka 

aku 

aba 

uba 

a ba 

a bu 

a d a 

ud  a 

ada 

adu 

a ga 

uga 

nga 

agu 

list  of  288  phonetically  transcribed  syllables,  individual 
Items  were  identified  by  talker  so  that  the  recordings  could 
be  made  in  a single  session. 

Recording  the  stimulus  tape.  For  two  reasons,  the 
spoken  examples  of  stimuli  were  tape  recorded  during  a single 
session.  First,  if  recordings  were  made  separately  a subse- 
quent splicing  or  rerecording  step  would  have  been  necessary 
to  randomize  the  stimuli  produced  by  different  talkers.  How- 
ever, by  recording  all  the  talkers  in  a single  session,  this 
randomization  was  accomplished  prior  to  the  recordings.  Sec- 
ondly, since  the  talkers  were  instructed  to  correct  any  errors 
in  the  production  of  the  stimuli,  they  and  the  experimenter 
provided  an  initial  check  of  each  other's  articulation. 

In  the  recording  session  the  talkers  were  seated  in  a 
series  1200  Industrial  Acoustic  Corporation  room  approximately 
two  feet  from  the  Altec  M-20  microphone  system,  which  was  cou- 
pled to  an  Ampex  model  351-C  full-track  tape  recorder  located 
outside  the  sound-treated  room.  Each  talker-listener  had  a 
complete  list  of  the  stimuli  indicating  which  talker  was  to 
read  each  of  the  words.  Talkers  were  instructed  to  use  the 
carrier  phrase  "now"  for  each  VCV  word  with  no  intervening 
pause.  By  using  such  a phrase  an  opportunity  for  the  talker 
to  reach  a stable  vocal  level  was  provided  before  his  pro- 
duction of  the  VCV  syllable.  The  word  "now"  was  chosen, 
rather  than  a more  conventional  "say  the  word"  or  "write  the 
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word,  ' because  it  does  not  include  any  of  the  sounds  used  as 
stimuli.  This  avoided  the  possibility  of  a constant  standard 
of  reference  for  one  or  more  of  the  test  stimuli.  The  ex- 
perimenter provided  a flash  of  light  as  a cue  for  the  begin- 
ning of  each  word  in  order  to  space  stimuli  at  four-second 
interval s . 

Following  the  recording  session,  the  tape  was  edited 
to  remove  errors  and  irrelevant  materials  and  to  insert  an 
identifying  number  before  each  group  of  ten  stimuli.  Since 
one  of  the  talkers  tended  to  use  a slightly  lower  vocal  level 
than  the  others,  the  gain  was  adjusted  during  the  rerecording 
process  to  equalize  the  levels  of  the  speech  peaks  on  the 
carrier  phrase. 

Observations 

Observers . The  criteria  for  selection  of  observers 
were:  1)  training  in  phonetics,  and  2)  normal  speech  and 

hearing.  Hearing  sensitivity  within  normal  limits  was  de- 
termined by  means  of  individual  screening  tests  at  250,  500, 
1000,  2000,  4000,  and  6000  Hz.  This  testing  was  accomplished 

with  a Beltone  model  10  A portable  audiometer,  set  at  a 
screening  level  of  15  dB  relative  to  the  1964  I.S.O.  standard. 

Ten  observers  who  met  the  criteria  were  recruited  from 
among  the  students  and  faculty  associated  with  the  Communi- 
cation Sciences  program  at  the  University  of  Florida.  This 
included  eight  male  and  two  female  observers  whose  ages  ranged 


19 


from  twenty  to  fifty  years.  For  each  of  the  observers,  the 
experimental  session  was  the  first  exposure  to  speech  sub- 
jected to  infinite  peak  clipping. 

-Playback  system.  The  complete  system  as  used  in  the 
experimental  trials  is  depicted  schematically  in  Figure  2. 

The  stimulus  tape  was  played  from  an  Ampex  351-C  tape  record- 
er into  one  channel  of  a Marantz  model  7 preamplifier  with 
controls  set  in  the  normal  flat  frequency  response  position. 
The  output  of  this  channel  was  led  to  a double— pole  double- 
throw  selector  control  which  either  connected  or  bypassed  the 
clipper.  The  clipping  unit,  consisting  of  three  cascaded 
pr i n t ed - c i r cu i t amplifier  modules,  operated  beyond  their 
linear  range,  is  described  in  detail  in  Appendix  A.  The 
"pass-clip"  switching  control  output  led  to  a Hewlett-Packard 
350-D  decade  attenuator  set.  The  signal  was  then  reampli- 
fied through  the  second  channel  of  the  preamplifier  connect- 
ed to  a Marantz  model  8-B  power  amplifier.  The  output  of  the 
power  amplifier  drove  an  AR-3  acoustic  suspension  loudspeaker 
system  located  in  the  I.A.C.  room.  A dual-beam  Tektronix 
type  564  oscilloscope  was  used  to  constantly  monitor  the 
speech  waveforms  at  the  input  of  the  clipping  unit  and  at  the 
output  of  the  power  amplifier.  Accordingly,  both  undistorted 
and  clipped  waveforms  could  be  observed  simultaneously. 

Playback  level.  Stimuli  were  presented  to  observers 
at  70  dB  Sound  Pressure  Level.  In  order  to  achieve  the  desired 
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Figure  2.  Block  diagram  of  equipment. 
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level  consistently  for  all  experimental  trials,  a preliminary 
calibration  was  performed.  A General  Radio  sound  level  meter, 
located  at  a position  in  the  I.A.C.  room,  where  observers 
would  sit  during  experimental  trials,  was  used  to  determine 
sound  pressure  levels.  The  RMS  voltage  reading  at  the  input 
to  the  loudspeaker  which  produced  a 70  dB  Sound  Pressure  Level 
at  the  observer  position  was  noted  and  used  as  reference  set- 
ting during  experimental  trials.  The  signal  used  to  set 
voltage  readings  was  a 1000  Hz  calibration  tone.  The  cali- 
bration tone  was  recorded  on  the  test  tape  at  a V.U.  level 
that  corresponded  to  the  average  V.U.  of  the  talkers'  speech 
peaks  on  the  tape;  the  signal  source  for  the  recording  was  a 
General  Radio  model  1304-B  beat  frequency  audio  generator. 

RMS  voltages  at  the  transducer  input  were  determined  with  a 
Ballantine  model  321-C  A.C.  Vacuum  Tube  Voltmeter.  Sound 
Pressure  Levels  were  read  on  the  B scale  of  the  sound-level 
met  er . 

s t em — c^a  1 ibratlon  . In  order  to  determine  whether  the 
present  experimental  apparatus  was  comparable  with  that  of 
previous  studies,  a system  calibration  was  performed.  This 
took  the  form  of  a pilot  study  using  a speech  sample  similar 

to  that  used  in  the  Licklider  and  Pollack  (1948)  study.  De- 

\ 

tails  are  presented  in  Appendix  B.  The  apparatus  was  found 
to  be  comparable. 


Conducting  experimental  trials . 


Observers  were  seated 
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in  one  of  three  seats  placed  at  a distance  of  two  and  one- 
half  feet  from  the  loudspeaker.  Half  of  the  observers  heard 
the  test  list  under  the  "clip"  treatment  first  and  "pass"  (un- 
distorted) second;  the  other  f i ve  obse  r ver  s heard  the  lists 
with  treatment  conditions  in  the  reverse  order. 

The  following  recorded  instructions  were  presented  to 
the  observers  before  the  VCV  list  under  each  treatment  con- 
dition: 


Transcribe  only  the  consonant  in  the  word.  Do 
not  transcribe  the  word  "now"  which  comes  be- 
fore each  word.  For  example,  if  you  hear 
"Now  eje,"  write  the  symbol  for  /j/.  If  you 
hear  "Now  emo , " write  the  symbol  for  /m/ . To 
help  you  keep  your  place,  after  each  group  of 
ten  words  the  number  of  the  following  word 
will  be  given. 

Just  prior  to  the  instructions  for  the  "clip"  condition, 
whether  it  came  first  or  second  in  order,  certain  recorded 
materials  were  presented  to  provide  some  familiarization  with 
the  sound  of  clipped  speech.  These  materials  included  one 
unfamiliar  passage--a  paragraph  from  an  out-dated  news  mag- 
azine, and  two  passages  which  were  presumed  to  be  very  fa- 
miliar: the  pledge  of  allegiance  to  the  flag,  and  digits 

from  one  to  twenty. 

I A brief  rest  period  was  provided  between  treatment 
conditions.  At  this  time  the  voltage  to  the  loudspeaker  was 
rechecked  and  the  tape  was  advanced  to  the  position  for  the 
following  treatment.  The  listening  session  required  about 


one  hour  to  complete  both  treatment  conditions. 
Data  reduction 
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Analy si  s — oj — var i anc  e . The  primary  analysis  of  re- 
sults used  a design  with  four  factors;  Consonant  (C),  Ob- 
server (0),  Talker  (T),  and  Distortion  (D).  Factor  C con- 
sisted of  the  twelve  consonants  used  in  this  study,  Factor 
0 represented  ten  observers  and  Factor  T represented  three 
talkers.  The  two  treatments  in  Factor  D were  the  distorted, 
or  clip  condition  and  the  undistorted,  or  "pass"  condition. 
The  latter  was  included  as  the  experimental  control  for  the 
effects  of  clipping.  The  design  utilized  a mixed  model  in 
which  the  Consonant  and  Treatment  factors  were  considered 
fixed  effects  and  the  Observer  and  Talker  factors  were  con- 
sidered random  effects.  The  structural  design  is  depicted 
in  Figure  3.  The  number  of  correct  responses  (from  zero  to 
eight)  made  by  a single  observer  to  a talker's  productions 
of  a given  consonant  was  entered  in  the  individual  cell.  For 
purposes  of  this  analysis,  differences  among  vowels  and  stress 
patterns  in  syllables  were  ignored;  the  syllable  was  viewed 
simply  as  the  vehicle  for  providing  a means  of  counterbal- 
ancing a variety  of  coarticulation  effects  among  the  test 
stimuli , and  as  a method  of  limiting  the  number  of  response 
alternatives  for  the  total  VCV  vocabulary  in  order  to  control 
the  contextual  variable. 


Individual  comparisons . 


A factorial  design  makes 


Q-  cr  Pv  -fX!  < Q;iN  -hCDCnt-O  D-  -+~0  < QV  N ~h  Q)  CO 
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Figure  3 . 


Structural  scheme  of  factorial  design. 


Treatment 
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possible  the  analysis  of  two  or  more  experimental  variables 
simultaneously,  both  of  their  individual  effects  and  of  their 
interactions  with  one  another.  In  an  analysis  of  variance, 
the  I]  ratios  obtained  are  evaluated  with  reference  to  the 
theoretical  distribution  of  in  order  to  test  the  signifi- 
cance of  differences  among  the  treatment  means. 

It  is  often  deemed  desirable  to  investigate  the  sources 
contributing  to  the  significance  of  a given  variable.  This 
may  be  accomplished  by  making  specific  comparisons  of  the 
significances  of  differences  among  means  of  the  levels  or 
categories  of  a variable.  For  instance,  in  the  present  study 
the  information  that  the  Consonant  factor  had  statistical  sig- 
nificance would  not  indicate  which  consonants  differed  from 
one  another,  but  comparisons  among  the  individual  consonants 
or  groups  of  consonants  could  provide  this  information. 

For  enumerative  data  such  as  those  collected  in  this 
study  the  chi-square  test  provides  an  appropriate  test  of 
significance.  The  present  data  represent  the  number  of  cor- 
rect responses  that  were  actually  obtained  compared  with  the 
number  of  correct  responses  possible.  According  to  Snedecor 
(1956)  the  basic  chi-square  formula  may  be  verbalized  as: 

® is  the  sum  of  such  ratios  as  (deviation  squared)/ 
(hypothetical  number)."  The  factorial  chi-square,  developed 

by  Brandt  and  Snedecor  (Brandt,  1949),  represents  an  alterna- 
tive method  of  calculation  of  chi-square  that  is  easily 
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adaptable  to  use  in  a factorial  design.  In  this  method  the 
sums  of  squares  are  multiplied  by  a chi-square  factor  to  de- 
termine the  value  of  chi-square  associated  with  a given  com- 
parison. The  chi-square  factor  represents  ( e ) ^/( o ) ( e-o ) , 
where  e equals  the  total  possible  sum  of  scores  and  ^ equals 
the  obtained  sum  of  scores.  The  sum  of  squares  associated 
with  the  comparison  is  ( d ) 2/ ( f ) ( s^ 3 ^2 ^ ^ where  d equals  the 
difference  between  sums  of  scores  of  the  quantities  being 
compared,  J.  is  the  frequency  of  each  item  being  compared,  and 
s.^  and  ^2  are  the  number  of  items  comprising  each  half  of  the 
comparison.  For  example,  if  one  sound  is  compared  with  another, 
would  equal  twice  one  squared,  or  two;  if  six  sounds 
were  compared  with  six  other  sounds,  (si2  + 32^)  would  equal 
twice  six  squared,  or  seventy-two.  The  obtained  value  of 
chi-square  is  tested  for  the  number  of  degrees  of  freedom 

appropriate  to  the  effect  or  interaction  for  which  the  com- 
pari son  i s made . 

This  method  was  used  to  make  voi c ed - voi c el e s s compari- 
sons, fricative-plosive  comparisons,  and  individual  consonant 
comparisons.  It  was  also  used  to  test  the  effect  of  order  of 
presentation  of  treatments. 

S t i mulu s - r es non s e matrices.  Information  about  types  of 
error  responses  was  displayed  in  confusion  matrices  which  were 
compiled  for  each  observer  (see  Appendix  C),  similar  to  those 
described  by  Miller  and  Nicely  (1955).  These  matrices  preserved 
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specific  responses  made  by  observers  to  each  talker' 


uli  . 


I 


s t im- 
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CHAPTER  III 

RESULTS  AND  DISCUSSION 
Analysis  of  Variance 

A summary  of  the  results  of  the  analysis  of  variance 
is  presented  in  Table  2. 

Mai n effects 

Two  main  effects  were  found  significant.  First,  the 
Distortion  factor  was  significant,  indicating  that  clipping 
does  produce  a decrement  in  consonant  intelligibility.  The 
total  percent  of  correct  responses  under  the  "pass"  condition 
was  94.6%;  the  corresponding  score  for  clipped  speech  was 
76.3%,.  This  gross  effect  of  clipping  on  all  stimuli  together 
does  provide  an  affirmative  answer  to  the  experimenter's  first 
question,  are  individual  speech  sounds  affected  by  infinite 
peak  clipping? 

The  Consonant  main  effect  was  also  significant.  This 
may  be  due  to  an  inherent  difference  in  consonant  intelli- 
gibility, whether  clipped  or  undistorted.  It  may  be  noted 
in  Table  3 that  for  both  /9/  and  /d/,  intelligibility  scores 
were  lower  than  for  other  consonants  under  both  conditions. 
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Table  2.  Summary  of  analysis  of  variance. 


Source 

d . f . 

Mean  Scjuare  Significance 

Consonant  ( C ) 

1 1 

9 2.45 

9.44 

* * 

Observer  (0) 

9 

3.18 

0.33 

N.S. 

Talker  (T) 

2 

0.05 

0 . 0 1 

N.S. 

Distortion  (D) 

1 

38  7 . 20 

39.55 

* * 

CxO 

99 

13.07 

1 . 29 

* 

CxT 

22 

52.05 

5.14 

* + 

CxD 

1 1 

142.71 

14.08 

* * 

OxT 

18 

2 . 23 

0 . 22 

N.S. 

OxD 

9 

48.99 

4.83 

* * 

TxD 

2 

200 . 53 

19.79 

* * 

CxOxT 

198 

7.81 

0.63 

N.S. 

C xOxD 

99 

20.57 

1.65 

* 

CxTxD 

22 

80.19 

6.44 

♦ * 

OxTxD 

18 

26.69 

2.14 

* * 

CxOxTxD 

198 

12.45 

pooled  residual 

a 

OxT 

CxOxT 

CxOxTxD 

414 

9 , 79 

pooled  residualj^ 

CxOxT 

CxOxTxD 

396 

10.13 

error  term  for  main  effects  was  pooled  residualg^ 

error  term  for  two-way  interactions  was  pooled  residualj^ 

error  term  for  three  way  interaction  was  CxOxTxD 


.01  level  of  significance  ** 
.05  level  of  significance  * 
not  significant  N.S. 
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Table  3.  Difference  between  treatments  for  each  consonant. 


Consonant 

Percent 

Pass 

Correct 
Cl  i ,) 

Chi- square 

s 

98.8 

94.6 

1.67 

0 

8 5.8 

38 . 8 

213.56 

f 

89.6 

70 . 8 

33.87 

z 

99.2 

83.3 

25 . 44 

(3 

6 5.8 

27 . 5 

141.56 

V 

97.9 

74 . 6 

52.45 

p 

100.0 

87.5 

15.05 

t 

99.6 

91.3 

6 .69 

k 

,99.6 

83.8 

24.15 

b 

99.2 

84.2 

21.68 

d 

99.6 

87.9 

13.11 

g 

100.0 

91.3 

7.38 

With  11  d f , chi-square  needed  for  significance  at  .05-  19.7 

. 01  - 24.7 
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The  main  effects  for  Talker  and  for  Observer  were  not 
found  to  be  significant.  This  was  reassuring,  since  both  of 
these  factors  were  potential  sources  of  considerable  experi- 
mental error.  Since  over-all  differences  among  talkers  and 
observers  were  not  statistically  significant,  many  of  the 
scores  to  be  reported  are  based  upon  the  pooled  talker  and 
observer  scores. 

Interactions 

Despite  the  lack  of  significance  of  the  Talker  and 
Observer  main  effects,  there  were  several  significant  inter- 
actions involving  talkers  and  observers.  Visual  inspection 
of  the  data  indicated  that  the  Consonant  x Talker  interaction 
may  have  been  due  to  a higher  score  of  correct  responses  to 
/e/  as  spoken  by  talker  A than  by  the  other  talkers,  and  a 
higher  score  on  /d/  as  spoken  by  talker  C.  Furthermore,  the 
Talker  x Distortion  interaction  showed  significance  since 
talker  C yielded  a higher  score  than  talkers  A and  B for  the 
"pass"  condition  but  a lower  score  than  the  others  under 
clipping.  These  interactions  involving  talkers  tend  to  con- 
firm the  advisability  of  using  more  than  one  talker  in  a 
study  such  as  this.  Divergent  over-all  scores  under  clip- 
ping between  two  observers  seemed  to  be  the  basis  of  the 
significant  Observer  x Distortion  interaction. 

Of  major  importance  in  this  study  is  the  Consonant 
X Distortion  interaction.  The  significance  of  this 
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interaction  provides  an  answer  to  the  experimenter's  second 
question;  are  these  effects  equal?  The  effect  of  clipping 
on  individual  sounds  is  not  equal. 

Individual  Comparisons 

Specific  consonants 

The  difference  between  distortion  treatments  for  each 
consonant  is  presented  in  Table  3,  in  which  the  percentage 
of  correct  responses  is  compared  for  each  consonant.  For 
almost  half  of  the  consonants  that  were  included  as  stimuli, 
the  intelligibility  under  clipping  was  not  significantly  dif- 
ferent than  for  undistorted  speech.  The  sounds  that  were 
affected  little  were  /s  g t d p/ . Significantly  different 
at  the  .01  level  were  /d  0 v f / . Of  the  consonants  that  were 
affected  by  clipping,  the  decrement  in  percent  of  correct 
responses  was  as  great  as  47%.  These  relationships  may  be 
visualized  easily  in  Figure  4. 

Considering  each  treatment  separately,  the  difference 
from  one  consonant  to  another  is  shown  in  Table  4.  This  table 
is  arranged  in  two  matrices;  the  upper  matrix  represents  the 
pass"  condition  and  the  lower  represents  the  "clip"  con- 
dition. In  each  matrix,  the  presence  of  an  asterisk  at  the 
intersect  of  a column  and  row  indicates  a statistically  sig- 
nificant difference  between  the  two  corresponding  sounds  in 

the  margins.  A blank  space  indicates  that  for  the  treatment 
under  consideration,  the  difference  between  the  two  sounds 
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Figure  4.  Intelligibility  scores  of  consonants. 

A.  Open  bar  indicates  undistorted  condition 

B.  Closed  bar  indicates  infinite  peak  clipped 
condi ti on 
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Table  4.  Matrix  of  significant  differences^ 
under  each  distortion  treatment. 


between  sounds 


Pass 


g d p b k 


0 d 


Sounds  are  arranged  in  order  of  decreasing  intelligibility 
under  clipping. 


significance  level  .01 
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(with  respect  to  the  number  of  stimuli  correctly  identified) 
is  not  statistically  significant  at  the  .01  level.  Under 
the  "pass"  condition,  /6/  is  statistically  different  from 
all  of  the  other  sounds.  Under  the  "clip"  condition,  there 
are  four  consonants  that  are  uniformly  less  intelligible  than 
the  others;  these  are  /3  9 f v/.  Reference  to  Table  4 makes 
it  possible  to  ascertain  the  significance  of  the  difference 
in  intelligibility,  under  either  treatment,  between  any  two  of 
the  test  stimuli. 

Results  of  the  present  study  seem  to  support  the  ob- 
servations made  by  Licklider  and  Pollack  that  a speech  wave- 
form can  retain  intelligibility  even  when  stripped  of  relative 
amplitude  clues  so  that  only  the  temporal  pattern  of  zero- 
axis  crossings  is  retained.  In  the  present  study,  data  are 
also  presented  which  show  that  the  extent  to  which  this  is 
true  depends  upon  the  speech  sample.^  With  contextual  clues 
controlled,  the  intelligibility  of  individual  consonant  sounds 
in  VCV  syllables  ranged  from  below  thirty  percent  to  scores 
above  ninety  percent,  for  observers'  first  exposure  to  speech 
distorted  by  infinite  peak  clipping.  The  differential  effect 


A pilot  study  was  conducted  as  a calibration  procedure 
to  determine  the  equivalence  of  the  present  experimental  sit- 
uation to  that  of  the  Licklider  and  Pollack  study.  It  was 
concluded  that  the  two  were  comparable.  See  Appendix  B for 
details. 
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of  clipping  on  individual  consonants  is  emphasized  by  noting 
the  scores  for  the  sounds  that  were  the  most  affected  and 
those  that  were  the  least  affected  by  clipping.  Scores  for 
/e/  and  /3/  were  38.8%  and  27.5%;  compared  with  the  scores 
obtained  without  distortion,  these  represent  decrements  of 
47.0%  and  38.3%  respectively.  The  score  of  95,0%  for  /s/ 
represents  a decrement  of  only  3.6%.  which  was  not  statisti- 
cally significant. 

Vo i c ed - VO i c el e s s grouping 

The  value  of  chi-square  associated  with  the  grouped 
scores  of  /s  0 f p t k/  and  grouped  scores  of  /z  d v b d g/ 
under  clipping  was  0.36;  the  value  necessary  for  significance 
at  the  .05  level  was  19.70.  There  were  no  significant  dif- 
ferences under  clipping  between  pairs  of  voi c ed- voi c el e s s 
cognate  sounds,  as  shown  in  Table  5.  Thus,  clipping  produced 
the  same  degree  of  effect  on  intelligibility  in  the  voiced 
and  voiceless  sounds.  Since  these  groups  are  primarily  dif- 
ferentiated on  the  basis  of  the  presence  or  absence  of  low- 
frequency  periodic  excitation,  it  is  to  be  expected  that 
clipping  would  not  affect  recognition  of  voicing. 

Fr i c at i ve- pi o si ve  grouping 

In  comparing  consonants  grouped  according  to  the 
f r i c a t i V e - pi o s i V e classification,  a value  of  chi-square  of 
46,14  was  obtained;  the  value  necessary  for  significance  at 
the  .01  level  was  24.70.  It  is  apparent  that  the  fricatives 
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Table  5.  Differences  between  vo i c ed - voi c el es s cognate  pairs 
subjected  to  infinite  peak  clipping. 


Consonant  Pair 

Chi -square 

S - z 

14 . 21 

0 - a 

12.19 

f - V 

1.35 

p - b 

1.07 

t - d 

1 . 07 

k - g 

5 . 43 

With  11  df,  value  of  chi-square  needed  for  significance  at 
.05  - 19.7 

.01  - 24.7 


I 
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wer e di f f er en t , as  a group,  from  the  plosives.  Intelli- 
gibility of  the  fricatives  was  considerably  lower.  How- 
ever, this  must  be  qualified;  two  of  the  fricatives,  /s/  and 
/z/  were  relatively  unaffected  by  clipping,  while  the  other 
four,  /0  5 V f/  were  severely  degraded  in  intelligibility. 
Consequently,  it  appears  that  the  distinguishing  character- 
istics of  fricatives  are  obscured  more  than  those  of  plosives, 
and  furthermore,  that  the  characteristics  of  the  dental  plo- 
sives are  obscured  more  than  the  alveolar  plosives.  This 
provides  an  answer  to  the  third  question,  is  there  a system- 
atic way  in  which  sounds  are  affected  by  peak  clipping?  The 
following  reasons  are  offered  in  partial  explanation: 

1)  Cavity  resonance.  The  four  severely  affected  fric- 
atives are  formed  at  the  teeth,  so  far  forward  in  the  mouth 
that  they  are  relatively  free  of  oral  cavity  resonance  ef- 
fects. The  friction  component  of  these  sounds  consists 

only  of  the  turbulence  of  the  breathstream  between  tongue  and 
teeth  or  between  lip  and  teeth,  radiating  into  the  air  (plus 
voicing  for  /b/  and  /v/) . It  is  evident  that  without  oral 
cavity  resonance  there  is  little  basis  for  distinguishing  the 
lingua-dental  sounds  from  the  labio-dental  sounds. 

2)  Friction  and  vocalic  components  of  fricatives.  An 
experiment  performed  at  the  Haskins  Laboratories  seems  es- 
pecially relevant  to  this  discussion.  Harris  (1958)  sepa- 
rated the  friction  components  of  fricative  sounds  in  VC 
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syllables  from  the  vocalic  portions  and  combined  the  fric- 
tion component  of  one  fricative  with  the  vocalic  portion 
associated  with  a different  fricative.  She  found  that  iden- 
tification of  /s/  and  /z/  could  be  made  by  observers  entirely 
on  the  basis  of  the  friction  component,  but  that  identifi- 
cation of  /f  V 9 6/  depended  on  the  combination  of  both  fric- 
tion and  vocalic  portions.  Since  in  the  present  study  /s/  and 
/z/  retained  such  high  intelligibility,  it  might  be  inferred 
that  the  removal  of  relative  amplitude  clues  did  not  affect 
the  transmission  of  information  inherent  in  the  friction  por- 
tion of  fricatives,  but  that  it  did  affect  the  vocalic  portion. 

Balance — o_f — noise.  The  /0/  and  /5/  are  characterized 
by  a low  intensity  noise,  while  the  /s/  and  /z/  are  considered 
to  have  a high  intensity  noise  component  (Jacobson,  Fant,  and 
Halle,  1952).  Since  one  characteristic  of  infinite  peak  clip- 
ping is  to  equalize  amplitudes  to  a uniformly  high  level,  it 
is  possible  that  clipping  might  do  more  violence  to  the  nat- 
urally lower  noise  levels  of  /0/  and  b/  than  to  the  normally 
higher  noise  levels  of  /s/  and  /z/. 

Durational  clues.  Still  another  explanation  can  be 
made  on  the  basis  of  the  clue  of  duration.  While  traditionally 
the  fricatives  are  considered  longer  in  duration  than  plosives. 
Miller  and  Nicely  (1955)  included  /s/  /z/  and  the  lateral 
sibilant  fricatives  in  one  class,  representing  the  consonants 
of  longest  duration,  but  the  dental  fricatives 


were  placed  in 
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the  same  category  of  duration  as  the  plosives.  Since  tem- 
poral characteristics  would  be  unchanged  by  clipping,  this 
would  provide  an  additional  means  of  identifying  the  /s/  and 
/z/.  It  would  also  help  to  explain  certain  errors  that  oc- 
curred, in  which  /p  t k/  were  the  responses  to  /f/  and  to  /v/. 

Aff^^ication  in  plosives.  A factor  related  to  con- 
fusions between  fricatives  and  plosives  may  be  the  affrica- 
tion  that  typically  follows  plosives  in  American  speech.  This 
affrication,  when  amplified  by  peak  clipping,  might  tend  to 
make  the  clipped  plosives  perceptually  resemble  the  fricatives. 
Order  of  treatment 

The  possibility  of  an  order  effect  based  on  the  pro- 
cedure in  which  five  observers  received  the  "pass"  condition 
first  and  the  other  five  received  the  "clip"  condition  first 
was  investigated.  As  Table  6 shows,  the  order  of  presentation 
did  not  yield  a significant  difference  for  either  the  "pass" 
or  the  "clip"  condition. 

Error  Responses 

Confusion  matrices  were  prepared  which  combined  results 
for  all  talkers  and  for  all  observers.  Figures  5 and  6 show 
types  of  responses  actually  made  to  the  stimuli  and  their 
frequency  of  occurrence.  The  individual  scores  shown  in  these 
figures  are  based  on  ten  observers'  pooled  responses  to  the 
eight  syllables  spoken  by  three  talkers  for  each  consonant-- 
a total  of  240  stimuli  per  consonant. 
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Table  6,  Effect  of  order  of  treatment  for  each  treatment. 


Order 


Tr  eatm  en  t 

Pas  s First 

Sum  o f 

Clin  First 
Scores 

Di fference 

Chi -square 

Pass 

1374 

1350 

24 

1.60 

Clip 

1135 

1061 

74 

15.26 

With  9 df,  value  of  chi-square  needed  for  significance  at 

.05  - 16.9 

.01-21.7 


Percent  response  on  pass  treatment 
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Figure  5.  Confusion  matrix  of  responses  to  undistorted  consonants. 
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Figure  6.  Confusion  matrix  of  responses  to  consonants  distorted  by  infinite 
peak  clipping. 
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Generally,  under  the  "pass"  condition  most  response 
errors  could  be  classified  in  the  same  voicing  and  manner  of 
release  category  as  their  respective  stimuli.  However,  under 
the  clip"  condition  the  types  of  error  were  divergent.  There 
were  very  few  instances  of  voi c ed- vo i c el es s confusions,  but 
there  were  many  confusions  on  the  pi o si ve- f r i c at i ve  dimension. 
These  tended  to  be  unidirectional;  that  is,  fricatives,  espe- 
cially /e/  and  /d/,  were  frequently  identified  as  plosives, 
but  plosives  were  rarely  heard  as  fricatives.  There  were 
other  interesting  examples  of  confusions  that  were  not  recip- 
rocal: twenty-eight  percent  of  the  responses  to  /0/  were  /f/, 

but  only  seven  percent  of  the  responses  to  /f/  were  /0/.  Sim- 
ilarly, forty-three  percent  of  the  responses  to  /3/  were  /v/, 
but  only  nine  percent  of  the  responses  to  /v/  were  /3/.  Under 
the  "pass"  condition  the  confusions  between  /0/  and  /f/  were 
approximately  equal,  but  confusions  between  /3/  and  /v/  ex- 
hibited the  same  sort  of  asymmetry  as  under  clipping. 

There  were  many  more  types  of  error  responses  used  by 
observers  under  the  "clip"  than  the  "pass"  condition.  This 
Wcxs  more  noticeable  for  the  dental  fricatives  than  for  other 


sounds . 


CHAPTER  IV 


SUMMARY  AND  CONCLUSIONS 


S ummar  V 

An  experiment  was  performed  to  determine  whether  in- 
dividual consonant  sounds  are  affected  differentially  by 
infinite  peak  clipping.  The  consonants,  representing  plo- 
sives and  fricatives,  voiced  and  voiceless,  and  five  artic- 
ulatory positions,  were  placed  in  Vowel-Consonant-Vowel  non- 
sense syllables.  These  stimuli  were  spoken  by  three  trained 
talkers  following  a predetermined  random  word  order  and  were 
tape  recorded.  Ten  adult  observers,  trained  in  phonetics, 
transcribed  the  consonants  in  the  stimulus  words  under  two 
conditions:  one  in  which  the  speech  was  passed  through  the 

system  undistorted,  and  one  in  which  the  speech  was  distorted 
by  infinite  peak  clipping. 

The  data  were  subjected  to  a factorial  analysis  of 
variance  to  appraise  the  statistical  significance  of  the  re- 
sults. The  main  effect  for  Distortion  treatment  was  sig- 
nificant as  was  the  interaction  between  Distortion  and  Con- 
sonant. This  indicated  that  not  only  did  infinite  peak 
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clipping  affect  the  intelligibility  of  speech,  but  it  does 
so  differently  for  different  consonant  sounds.  The  main  ef- 
fects reflecting  Talker  and  Observer  variance  were  not  sig- 
'^ificant,  indicating  a lack  of  systematic  error  from  these 
s our ces . 

The  dental  fricatives  were  the  most  severely  affected 
by  clipping.  The  plosives  and  /s/  and  / z/  were  the  least 
affected  by  clipping.  Intelligibility  scores  for  clipped 
consonants  ranged  from  27.5%  to  94.6%;  the  range  for  the  un- 
distorted condition  was  from  65.8%  to  100%.  Sounds  that  ex- 
hibited the  poorest  intelligibility  when  not  distorted  also 
showed  the  lowest  intelligibility  under  clipping.  Voiced  and 
voiceless  sounds  were  equally  affected  by  clipping. 

Conclusi ons 

Infinite  peak  clipping  affects  the  intelligibility  of 
individual  speech  phonemes. 

The  effect  of  infinite  peak  clipping  on  the  intelligi- 
bility of  individual  consonant  sounds  is  different  from  one 
sound  to  another. 

There  is  a systematic  pattern  in  the  relative  effects 

of  infinite  peak  clipping  on  intelligibility  of  individual 

sounds.  Dental  fricatives  show  the  largest  decrement  under 

clipping;  plosives  and  sibilant  fricatives  are  the  least 

affected  of  the  sounds  sampled.  Intelligibility  of  voiced 

and  voiceless  sounds  are  equally  affected  by  infinite  peak 
cli pping . 


APPENDIX  A 


DETAILS  OF  THE  CLIPPING  APPARATUS 


Three  resistance-capacitance-coupled  pr  i n t ed  - c i r cu  i t 
amplifier  modules  were  used  in  cascade.  Clipping  was  achieved 
by  overdriving  the  amplifiers  between  saturation  and  cutoff, 
as  described  by  Littwin  (1964)  , Each  module  consisted  of  one 
emi t t e r - f o 1 1 o wer  stage  driving  two  stages  of  c ommon -em i t t er 
amplification.  A schematic  of  the  amplifier  modules  is  shown 
in  Figure  7 . 

Undistorted  operation  of  each  module  separately  and  of 
the  three  modules  cascaded,  when  operated  within  their  linear 
range,  was  demonstrated  by  observing  the  input  and  output 
waveforms  on  a Tektronix  type  564  dual-beam  storage  oscillo- 
scope. These  waveforms  were  photographed  using  a C-12  oscillo- 
scope camera  with  Polaroid  pack.  In  Figure  8 the  upper  traces 
show  a 1000  Hz  tone  at  the  output  of  a Hewlett-Packard  model 
201-C  oscillator , and  the  lower  traces  show  the  waveforms  at 
the  output  of  successive  cascaded  modules.  In  Figure  8A  the 
unit  is  shown  operating  within  its  linear  range;  in  B it  is 
operating  as  a peak  clipper.  The  action  of  the  clipper  on 
samples  of  speech  is  shown  in  Figure  9.  Photographs  A and 
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Figure  7.  Schematic  of  clipper  amplifier  module. 


re  8 


Output  waveforms  of  modules 

A.  Linear  operation 

B.  Peak  clipping 
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Action  of  clipper  on  speech  waveforms. 
A.  Vowel 
B , Consonant 


Figure  9 . 


B show  waveforms  at  two  different  moments  during  utterance  of 
the  word  "mash,"  corresponding  to  the  vowel  and  the  final  con- 
sonant. The  upper  traces  show  the  speech  signal  at  the  tape 
playback  output  of  an  Ampex  model  354  tape  recorder;  the  lower 
traces  show  the  same  samples  monitored  at  the  output  of  the 
c 1 i pper  . 

The  frequency  response  of  each  module  and  of  the  three 
in  tandem  was  found  to  be  60  to  12,000  Hz  + 3 dB.  This  was 
determined  by  measuring  the  root  mean  square  output  voltage 
of  the  clipper,  with  constant  input,  using  pure  tones  gener- 
ated by  a General  Radio  model  1304-B  beat  frequency  audio 
oscillator.  Voltages  were  measured  with  a Ballantine  model 


321  A.C.  Vacuum  Tube  Voltmeter. 


APPENDIX  B 


SYSTEM  CALIBRATION  PILOT  STUDY 


Pur  pose 

To  determine  if  the  present  experimental  situation  was 
comparable  to  that  used  by  previous  investigators,  the  tradi- 
tional procedures  were  replicated  in  a pilot  study.  In  par- 
ticular, the  intelligibility  of  undistorted  and  clipped  PB 
words  was  compared  using  materials  described  by  Licklider  and 
Pollack  (1948). 

Stimulus  materials 

Lists  one  and  two  of  the  Harvard  PB-50  words  (Egan, 
1948)  were  arranged  in  random  word  order.  As  in  the  Licklider 
and  Pollack  (1948)  study,  one  adult  male  talker  who  had  train- 
ing in  phonetics  read  both  lists.  The  words  were  tape  record- 
ed in  a series  1200  Industrial  Acoustics  Corporation  room 
using  an  Altec  M-20  condenser  microphone  coupled  to  an  Ampex 
model  350— C full— track  tape  recorder. 

The  stimulus  words,  each  in  the  carrier  phrase  "Write 
the  word  ,"  were  spoken  as  one  continuous  sentence.  Dur- 

ing the  recording,  the  talker  monitored  the  deflections  of  a 
VU  meter  associated  with  this  phrase  in  order  to  maintain  a 
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uniform  level  of  vocal  intensity  among  the  words.  The  words 
were  separated  by  an  interval  of  five  seconds,  and  before  each 
ten  words  a number  identifying  the  following  stimulus  was 

given,  using  the  phrase,  "Next  is  number  

Apparatus 

The  peak  clipping  unit  consisted  of  three  cascaded 
printed-circuit  amplifier  modules  operated  beyond  their  linear 
range.  Details  of  this  instrument  are  given  in  Appendix  A. 

A high  quality  audio  playback  system  which  is  described  in 
Chapter  11  was  used  for  presentation  of  stimulus  materials. 
Subjects 

Ten  observers,  three  males  and  seven  females,  were  se- 
lected from  the  staff  of  the  Communication  Sciences  Laboratory. 
The  observers  ranged  in  age  from  nineteen  to  thirty-six  years 
and  demonstrated  normal  hearing.  Hearing  was  screened  at  15 
dB  relative  to  the  l.S.O.  standard  at  test  frequencies  of  250, 
500,  1000,  2000,  4000,  and  6000  Hz,  using  a Beltone  model  10  A 

portable  audiometer. 

Procedure 

Experimental  trials  were  conducted  in  a series  1200 
Industrial  Acoustics  Corporation  auditory  test  room.  Observers 
were  seated  in  one  of  three  chairs  positioned  at  two  and  one- 
half  feet  from  the  loudspeaker.  Instructions  were  given  to 
write  the  words  that  they  heard,  using  response  forms  that 
were  provided  with  spaces  for  the  fifty  words  of  a PB  word  list. 
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In  order  to  avoid  any  systematic  effect  associated 
with  the  difficulty  of  one  PB  list  relative  to  another,  the 
distortion  treatments  and  word  lists  were  counterbalanced, 
ior  the  "pass"  treatment  (without  peak  clipping)^  five  ob- 
servers heard  List  One,  and  the  other  five  observers  heard 
List  Two.  For  the  "clip"  treatment,  each  observer  heard  the 
list  that  was  not  presented  to  him  under  the  "pass"  treatment. 
Results 

Results  are  shown  in  Table  7.  The  mean  intelligibility 
scores  for  the  ten  observers  for  the  "pass"  treatment  was 
95.8%;  the  scores  for  the  "clip"  treatment  were  68.8%.  In 
the  Licklider  and  Pollack  study  (1948)  the  mean  score  for 
five  observers  in  the  first  test  session  was  approximately 
98%  without  distortion,  and  64%  with  infinite  peak  clipping. 
Conclusion 

With  PB-50  lists  as  stimulus  material,  the  apparatus 
and  procedure  used  in  the  present  experiment  duplicated  re- 
sults obtained  by  Licklider  and  Pollack  (1948).  Intelligibility 
scores  obtained  without  distortion  and  with  infinite  peak  clip- 
ping were  of  the  same  order  of  magnitude  as  those  reported  in 


the  lit  er ature . 


5 7 


Table  7.  Intelligibility 
tortion  and  with  infinite 


scores  for  PB  lists  without  dis- 
peak clipping. 


Observer  s 


No  Distortion 


Cl i pping 


Group  1 

List  1 

9 5.2% 

List  2 

65.6% 

Group  2 

List  2 

96.4% 

List  1 

7 2.0% 

All 


95.8% 


68.8% 
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Figure  10.  Matrix  of  responses  by  Observer  1 to  undistorted 
consonants. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  11.  Matrix  of  responses  by  Observer  .1  to  consonants 
distorted  by  infinite  peak  clipping. 

Letters  A,  B,  and  C identify  talkers. 
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Figure  12.  Matrix  of  responses  by  Observer  2 to  undistorted 
consonants.  ot-uxteu 


Letters  A,  B 


and  C identify  talkers. 
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Figure  13.  Matrix  of  responses  by  Observer  2'  to  consonants 
aistorted  by  infinite  peak  clipping. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  14.  Matrix  of  responses  by  Observer  3 to  undistorted 
consonant  s . 


Letters  A,  B,  and  C identify  talkers. 
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Figure  15.  Matrix  of  responses  by  Observer  3 to  consonants 
distorted  by  infinite  peak  clipping. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  16.  Matrix  of  responses  by  Observer  4 to  undistorte 
consonants. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  17.  Matrix  of  responses  by  Observer  4 to 
distorted  by  infinite  peak  clipping. 

I 

Letters  A,  B,  and  C identify  talkers. 
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Figure  18.  Matrix  of  responses  by  Observer, 5 to  undi stor ted 
con  s on  an  t s . 


Letters  A,  B,  and  C identify  talkers. 
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Figure  19.  Matrix  of  responses  by  Observer  5 to  consonants 
distorted  by  infinite  peak  clipping. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  20.  Matrix  of  responses  by  Observer  6 
consonants. 


to  undi storted 


Letters  A,  B,  and  C identify  talkers. 
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Figure  21.  Matrix  of  responses  by  Observer  6 to  consonants 
distorted  by  infinite  peak  clipping. 

Letters  A,  B,  and  C identify  talkers. 
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Figure  22.  Matrix  of  responses  by  Observer  7. to  undistorted 
con  sonant  s . 


Letters  A,  B,  and  C identify  talkers. 
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Figure  23.  Matrix  of  responses  by  Observer  7. to  consonants 
distorted  by  infinite  peak  clipping. 

Letters  A,  B,  and  C identify  talkers. 
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Figure  24.  Matrix  of  responses  by  Observer  8 to  undistorted 
consonant  s . 


Letters  A,  B,  and  C'  identify  talkers. 


Number  of  Responses 


b 

e 

•rH 


CO 

cn 

cd 

-a 

0) 

CO 

CO 

-p 

c 

(d 

c 

o 

CO 

c 

o 

u 


S0f25vpi'kb<ig«mv/-| 


c ^ 

5 B 

c 

A 

6 B 
C 

P A 

f B 

C 

A 

Z B 

C 
A 

5 B 

C 
A 

V 8 

C 

A 

P c" 

t B 

C 

A 

k B 

b B 

c 

8 

8 

7 

1 

1 

} 

i 

1 

1 

6 

4 

1 

3 

8 

4 

8 

1 

1 

1 

1 

1 

5 

7 

7 

1 

1 

1 

6 

8 

2 

8 

3 

1 

1 

1 

0 

5 

5 

8 

8 

1 

8 

1 

1 

1 

8 

7 

0 

1 

7 
6 

8 

1 

8 

8 

7 

8 
6 

1 

8 

1 

0 

7 

8 

1 

1 

8 

3 

7 

6 

4 

A 

d B 

c 

A 

a B 
^ c 

1 

1 

1 

8 

7 

4 

8 

1 

1 

7 

7 

8 

1 

Figure  25.  Matrix  of  responses  by  Observer  8 to  consonants 
distorted  by  infinite  peak  clipping. 

Letters  A,  B,  and  C identify  talkers. 
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Figure  26.  Matrix  of  responses  by  Observer  9. to  undistorted 
consonants . 


Letters  A,  B,  and  C identify  talkers 
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Figure  27.  Matrix  of  responses  by  Observer  9 to  consonants 
distortsd  by  infinit©  peak  clipping. 


Letters  A,  B,  and  C identify  talkers. 
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Figure  28.  Matrix  of  responses  by  Observer  10  to  undistorted 
consonants  . 

Letters  A,  B,  and  C identify  talkers. 
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Figure  29.  Matrix  of  responses  by  Observer  10'  to  consonants 
distorted  by  infinite  peak  clipping. 


Letters  A,  B,  and  C identify  talkers. 
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